随笔分类
分词器
ES中默认使用的分词器对中文分词并不友好,不适用于中文网站,为了达到更佳的搜索效果,需要对 ES的默认分词器进行替换,以来得到中文的友好分词
使用 IK分词器
安装
ES是使用 docker安装的,网上许多教程不可信,自己稍微去折腾了下,在此记录下配置结果待日后使用
IK分词器需要和安装的 ES版本严格一致,和 Kibana一样
https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v6.8.0,找到指定的 IK分词器版本即可,本人的是 6.8.0
在进入container后,cd plugins/,实际上进入的目录是 /usr/share/elasticsearch/plugins
[root@iz8vbfwigxwd3shlj9smd0z ~]# docker exec -it d886a17edc8f /bin/bash
[root@d886a17edc8f elasticsearch]# cd plugins/
[root@d886a17edc8f plugins]# pwd
/usr/share/elasticsearch/plugins
创建 lk目录,在进入 IK目录中去
[root@d886a17edc8f plugins]# ls
[root@d886a17edc8f plugins]# mkdir ik
[root@d886a17edc8f plugins]# ls
ik
[root@d886a17edc8f plugins]# cd ik
在线安装 IK,注意版本一致问题
[root@d886a17edc8f plugins]# wget https://github.wuyanzheshui.workers.dev/medcl/elasticsearch-analysis-ik/releases/download/v6.8.0/elasticsearch-analysis-ik-6.8.0.zip
解压到当前目录(即 IK),然后退出容器,重启容器服务即可
[root@d886a17edc8f ik]# unzip elasticsearch-analysis-ik-6.8.0.zip
Archive: elasticsearch-analysis-ik-6.8.0.zip
creating: config/
inflating: config/quantifier.dic
inflating: config/stopword.dic
inflating: config/preposition.dic
inflating: config/main.dic
inflating: config/extra_main.dic
inflating: config/IKAnalyzer.cfg.xml
inflating: config/extra_single_word_full.dic
inflating: config/extra_stopword.dic
inflating: config/extra_single_word_low_freq.dic
inflating: config/suffix.dic
inflating: config/surname.dic
inflating: config/extra_single_word.dic
inflating: elasticsearch-analysis-ik-6.8.0.jar
inflating: httpclient-4.5.2.jar
inflating: httpcore-4.4.4.jar
inflating: commons-logging-1.2.jar
inflating: commons-codec-1.9.jar
inflating: plugin-descriptor.properties
inflating: plugin-security.policy
[root@d886a17edc8f ik]# ls
commons-codec-1.9.jar elasticsearch-analysis-ik-6.8.0.jar httpcore-4.4.4.jar
commons-logging-1.2.jar elasticsearch-analysis-ik-6.8.0.zip plugin-descriptor.properties
config httpclient-4.5.2.jar plugin-security.policy
[root@d886a17edc8f ik]# exit
exit
[root@iz8vbfwigxwd3shlj9smd0z ~]# docker restart d886a17edc8f
d886a17edc8f
进行测试
注意:分词器名为:ik_max_word
GET /ems/_analyze
{
"analyzer": "ik_max_word",
"text":"这个年轻人不简单"
}
# 分词后效果
{
"tokens" : [
{
"token" : "这个",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "年轻人",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "年轻",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "人",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 3
},
{
"token" : "不简单",
"start_offset" : 5,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 4
},
{
"token" : "不",
"start_offset" : 5,
"end_offset" : 6,
"type" : "CN_CHAR",
"position" : 5
},
{
"token" : "简单",
"start_offset" : 6,
"end_offset" : 8,
"type" : "CN_WORD",
"position" : 6
}
]
}
至此,IK分词器安装成功